{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 2. Off Policy Evaluation And Causal Inference"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## (a) Importance Sampling"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If $\\hat \\pi_0 = \\pi_0$, then\n",
    "$$\\begin{align*}\n",
    "\\mathbb E_{\\substack{s\\sim p(s) \\\\ a \\sim \\pi_0(s,a)}} \\frac{\\pi_1(s,a)}{\\hat \\pi_0(s,a)}R(s,a) &=\\mathbb E_{\\substack{s\\sim p(s) \\\\ a \\sim \\pi_0(s,a)}} \\frac{\\pi_1(s,a)}{ \\pi_0(s,a)}R(s,a)\\\\\n",
    "&=\\sum_{(s,a)} \\frac{\\pi_1(s,a)}{ \\pi_0(s,a)}R(s,a) p(s)\\pi_0(s,a) \\\\\n",
    "&=\\sum_{(s,a)} R(s,a) p(s)\\pi_1(s,a) \\\\\n",
    "&=\\mathbb E_{\\substack{s\\sim p(s) \\\\ a \\sim \\pi_1(s,a)}} R(s,a).\n",
    "\\end{align*}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## (b) Weighted Importance Sampling\n",
    "\n",
    "If $\\hat \\pi_0 = \\pi_0$, then"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$\\begin{align*}\n",
    "\\mathbb E_{\\substack{s\\sim p(s) \\\\ a \\sim \\pi_0(s,a)}} \\frac{\\pi_1(s,a)}{\\hat \\pi_0(s,a)} &= \\sum_{(s,a)}\\frac{\\pi_1(s,a)}{\\hat \\pi_0(s,a)}p(s)\\pi_0(s,a) \\\\\n",
    "&= \\sum_{(s,a)} p(s,a) = 1.\n",
    "\\end{align*}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Hence by (a)\n",
    "$$\\frac{\\mathbb E_{\\substack{s\\sim p(s) \\\\ a \\sim \\pi_0(s,a)}} \\frac{\\pi_1(s,a)}{\\hat \\pi_0(s,a)}R(s,a)} {\\mathbb E_{\\substack{s\\sim p(s) \\\\ a \\sim \\pi_0(s,a)}} \\frac{\\pi_1(s,a)}{\\hat \\pi_0(s,a)}}=  \\mathbb E_{\\substack{s\\sim p(s) \\\\ a \\sim \\pi_1(s,a)}}  R(s,a).$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## (c)\n",
    "If there is only one data element $(s,a,R(s,a))$ in the observational dataset, then the estimation given by the weighted importance sampling estimator equals"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$\\frac {\\frac{\\pi_1(s,a)}{\\hat \\pi_0(s,a)}R(s,a)} { \\frac{\\pi_1(s,a)}{\\hat \\pi_0(s,a)}} = R(s,a) $$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Therefore the expected value (under the sampling distribution) of the weighted importance sampling estimator is \n",
    "$$\n",
    "\\mathbb E_{\\substack{s\\sim p(s) \\\\ a \\sim \\pi_0(s,a)}} R(s,a),$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "which in general will be different from \n",
    "$$ \\mathbb E_{\\substack{s\\sim p(s) \\\\ a \\sim \\pi_1(s,a)}} R(s,a),$$\n",
    "i.e. the weighted importance sampling estimator is biased in this case."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## (d) Doubly Robust"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**i.**\n",
    "\n",
    "We have "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$\\begin{align*}\n",
    "\\mathbb E_{\\substack{s\\sim p(s) \\\\ a \\sim \\pi_0(s,a)}} \\mathbb E_{a'\\sim \\pi_1(s,a)} \\hat R(s,a') &= \\sum_{s,a}p(s)\\pi_0(s,a)\\sum_{a'}\\pi_1(s,a')\\hat R(s,a')\\\\\n",
    "&= \\sum_{s,a'}p(s)\\pi_1(s,a')\\hat R(s,a') \\sum_{a}\\pi_0(s,a)\\\\\n",
    "&= \\sum_{s,a'}p(s)\\pi_1(s,a')\\hat R(s,a')\\\\\n",
    "&= \\mathbb E_{\\substack{s\\sim p(s) \\\\ a \\sim \\pi_1(s,a)}} \\hat R(s,a).\n",
    "\\end{align*}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Similarly to (a) we also have \n",
    "$$\\mathbb E_{\\substack{s\\sim p(s) \\\\ a \\sim \\pi_0(s,a)}} \\frac{\\pi_1(s,a)}{\\hat \\pi_0(s,a)}(R(s,a)-\\hat R(s,a)) =\\mathbb E_{\\substack{s\\sim p(s) \\\\ a \\sim \\pi_1(s,a)}} (R(s,a)-\\hat R(s,a))$$\n",
    "if $\\hat\\pi_0 = \\pi_0$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Therefore by additivity of the expected value the claim follows."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**ii.**\n",
    "\n",
    "The first equation from i. above still holds. Therefore if $\\hat R = R$, then \n",
    "$$\\mathbb E_{\\substack{s\\sim p(s) \\\\ a \\sim \\pi_0(s,a)}} \\mathbb E_{a'\\sim \\pi_1(s,a)} \\hat R(s,a') = \\mathbb E_{\\substack{s\\sim p(s) \\\\ a \\sim \\pi_1(s,a)}} R(s,a).\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Also in this case we have \n",
    "$$\\mathbb E_{\\substack{s\\sim p(s) \\\\ a \\sim \\pi_0(s,a)}} \\frac{\\pi_1(s,a)}{\\hat \\pi_0(s,a)}(R(s,a)-\\hat R(s,a)) = 0.$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "From this the claim follows again by additivity of the expected value."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## (e) \n",
    "\n",
    "**i.** If the interaction between drug, patient and lifespan is very complicated, the importance sampling estimator probably gives better results than the regression estimator, because it doesn't need to learn an estimator $\\hat R(s,a)$ for the complicated function $R(s,a)$.\n",
    "\n",
    "**ii.** In this case the regression estimator seems to be a better choice, because unlike the importance estimator it doesn't have to find an estimate $\\hat \\pi_0$ for the complicated baseline policy $\\pi_0$."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
